SFR | e-Content Development Program

Two steps in classificattion

1. Model construction: describing a set of predetermined classes

–Each tuple is assumed to belong to a predefined class, as determined by the class label attribute (supervised learning)

–The set of tuples used for model construction: training set

–The model is represented as classification rules, decision trees, or mathematical formulae

2. Model usage: for classifying previously unseen objects

–Estimate accuracy of the model using a test set

•The known label of test sample is compared with the classified result from the model

•Accuracy rate is the percentage of test set samples that are correctly classified by the model

•Test set is independent of training set, otherwise over-fitting will occur